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We present a careful frequentist analysis of one- and two-point statistics of the hot and cold spots 
in the cosmic microwave background (CMB) data obtained by the Wilkinson Microwave Anisotropy 
Probe (WMAP). Our main result is the detection of a new anomaly at the 3-sigma level using 
temperature- weighted extrema correlation functions. We obtain this result using a simple hypothesis 
test which reduces the maximum risk of a false detection to the same level as the claimed significance 
of the test. We further present a detailed study of the robustness of our earlier claim (Larson and 
Wandelt 2004) under variations in the noise model and in the resolution of the map. Free software 
which implements our test is available online. 



I. INTRODUCTION 

Inflation predicts Gaussian random density fluctua- 
tions in the early Universe. This prediction can be tested 
by observing the Cosmic Microwave Background (CMB) 
anisotropies. By observing the CMB, we look back early 
enough to check if these initial conditions of the Uni- 
verse were laid down by inflation. Density fluctuations 
produced in standard models of inflation will cause the 
CMB to be a highly Gaussian isotropic random field — 
a directly testable prediction since the high-resolution, 
all-sky data set of the Wilkinson Microwave Anisotropy 
Probe (WMAP) has become available 0. 

In this paper we check that prediction, continuing our 
work in Q, hereafter LW04. As in that paper, we look 
at the pattern of hot and cold spots (local extrema) seen 
in the CMB sky. We use several methods to determine if 
that pattern is statistically similar to the patterns of hot 
and cold spots that we simulate for Gaussian isotropic 
random fields on the sky. In LW04, we described an 
anomaly of the one-point statistics of hot and cold spot 
excursions. In this paper, we extend this earlier analysis 
and add a detailed study of two-point statistics, the spot- 
spot and temperature-weighted correlation functions of 
hot and cold spots. 

Most frequentist searches for non-Gaussianity follow 
a set recipe: first compare a statistic computed on ob- 
served data to a set of Monte Carlo simulations. An 
assessment of goodness-of-fit then leads to a significance 
level at which Gaussianity is rejected. This assessment is 
often rather informal and prone to false detection. One 
of the main goals of our paper is the presentation of a 
robust hypothesis test that is applicable to all tests of 
Gaussianity in this category. 
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A number of groups have applied this recipe to var- 
ious statistics of the CMB anisotropy. For example, 
Vielva et al. detect non-Gaussianity in the three- and 
four-point wavelet moments , as do Liu and Zhang Q , 
who claim it may be a residual foreground. McEwcn 
et al. investigate wavelets that aren't azimuthally sym- 
metric, and find non-Gaussianity using the skewness 
and kurtosis of their wavelet coefficients 0. Chiang 
et al. detect non-Gaussianity in phase correlations be- 
tween spherical harmonic coefficients 0, 0, Q , and Park 
finds it in the genus Minkowski functional Erik- 
sen et al. find anisotropy in the n-point functions of the 
CMB in different patches of the sky ^(J -Others discuss 
possible methods of detecting non-Gaussianity. Aliaga 
et al. look at studying non-Gaussianity through spheri- 
cal wavelets and "smooth tests of goodness-of-fit" [Tlj . 
Cabella et al. review three methods of studying non- 
Gaussianity: through Minkowski functionals, spherical 
wavelets, and the spherical harmonics |l2j . They propose 
a way to combine these methods. More recently, Cabella 
et al. constrain one generic type of non-Gaussianity us- 
ing spherical wavelets and local curvature of the CMB 
temperature field 01 ■ Komatsu et al. discuss a fast way 
to test the bispectrum for primordial non-Gaussianity in 
the CMB j an d do not detect it . Using a generic 
model for non-Gaussianity, Babich shows that the bispec- 
trum is the best way to constrain it, and therefore claims 
that the bispectrum test used by the WMAP team (Ko- 
matsu et al.) is optimal [l^. Finally, Gaztanaga et al. 
find the CMB to be consistent with Gaussianity when 
considering the two and three-point functions [171 fl8| . 

This paper is laid out as follows. In section 2, we 
briefly discuss the statistical approach underlying all fre- 
quentist blind searches of non-Gaussianity. In section 3, 
we study the one-point statistics of hot and cold spots at 
several map resolutions and for various noise models. In 
Section 4 we remove dependence on the noise model by 
smoothing and describe the spot-spot and temperature 
weighted correlation functions of hot and cold spots at 
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50' and 3 degrees. We conclude in section 5. The Ap- 
pendix describes our robust hypothesis test, as well as 
the free software which implements it. 



II. STATISTICAL ANALYSIS 

The problem we tackle in this paper is how to find 
non-Gaussianity in the CMB when we do not have a spe- 
cific model for the non-Gaussianity. We have a measured 
CMB sky (WMAP data) and the ability to make many 
Gaussian simulations of CMB skies, and we want to de- 
termine if the measured CMB sky looks like it came from 
the distribution of skies we simulate. 

Because we do not have a model for non-Gaussianity, 
we do not have a second distribution of skies to compare 
to the Gaussian distribution. This limits the testing we 
can do to merely determining how large a statistical fluc- 
tuation our currently measured CMB sky is. 

We approach the problem numerically as follows: we 
find some way to reduce the an entire CMB sky to a sin- 
gle number, a single statistic computed on the hot and 
cold spots. We compare the statistic for the measured 
CMB sky to the distribution of statistics for the simu- 
lated CMB skies. If the measured statistic falls signif- 
icantly higher or lower than all of the others, then we 
have a large statistical fluctuation, which we quantify. It 
is then up to the reader to determine if this should be in- 
terpreted as merely an unlikely statistical fluctuation, an 
indication of non-Gaussianity, a residual foreground, or 
a mismatch between our simulations and the actual ob- 
servations. To eliminate that last possibility, we describe 
our simulations in detail. 

The specific simulation methods and statistics we use 
are detailed in the corresponding sections. A detailed 
discussion of what constitutes a statistically significant 
detection is given in the Appendix and accompanying 
freely distributed Mathematica notebook and C code. 



III. MULTI-RESOLUTION ONE-POINT 
ANALYSIS 

A. WMAP Measurement 

We construct a single temperature map of the CMB to 
represent the WMAP team's measurements. Motivated 
by our multi-frequency study in LW04, we take this map 
to be an unweighted average of the four channels of the 
W-band foreground cleaned temperature maps (available 
on the LAMBDA web site|2^|- We use an unweighted 
average of the temperature maps so that we can take the 
(azimuthally symmetric) beam to simply be the average 
of the beams for each of the channels. This is the map 
whose properties we concentrate on simulating. 



B. Simulation Process 

We take legitimate shortcuts in our Gaussian sky sim- 
ulation process. The brute-force way to simulate the 
WMAP team's measurement of a Gaussian CMB sky is 
to simulate the CMB sky, add foregrounds, simulate the 
timc-ordcrcd-data stream from the WMAP satellite, and 
then run it through the full map-making and foreground 
removal pipeline. We take a simpler approach; we sim- 
ulate the result of that process by a Gaussian CMB sky 
and then add either white or correlated noise to that 
sky. This is acceptable because the full WMAP pipeline 
does a good job of reconstructing some characteristics of 
the true CMB sky, and we are careful to make sure our 
statistics depend only on those characteristics. 

One possible criticism of LW04 is that we used uncor- 
rected noise in our CMB simulations. There are good 
reasons for using uncorrclated noise: the noise is stated 
to be white noise, and the primary correlations are at 
angular scales of about 141 degrees and are on the order 
of 0.3% For this paper, however, we compare the 

results of white and correlated noise. 

The WMAP team has provided 110 publicly available 
[20j correlated noise simulations that can be used to cal- 
culate the correlated noise on our averaged W-band map. 
We take each of the simulated noise maps provided by the 
WMAP team and average and rescale them. Since there 
were 4 channels in the W band, we do an unweighted 
average of the noise maps. Then we rescale the noise, 
pixel-by-pixel, so that the number of effective observa- 
tions (amount of noise in that pixel) exactly matches the 
amount quoted in the measured W band. This is nec- 
essary because the number of observations in the simu- 
lated data provided by the WMAP team does not exactly 
match the number of observations in the measured data 
|2lj . This requires about a 2.5% decrease in the noise. 
Since the correction is approximately the same in each 
pixel, this rescaling should maintain the correlations in 
the noise. 

Our white noise simulations model the noise as an un- 
weighted average of Gaussian noise in each of the four 
W-band channels. 



C. Data Reduction 

Care must be taken in our reduction of the data to a 
single statistic. Specifically, the reduction process should 
ignore data contaminated by galactic foregrounds, and 
should be insensitive to monopole and dipole moments. 
In this section we describe our data reduction process: 
lowering the resolution of the HEALPix[3(| map, ignor- 
ing data contaminated by Galactic foregrounds, remov- 
ing the monopole and dipole moments, finding the lo- 
cal maxima and minima in temperature, and calculating 
statistics on these local extrema. 

Lowering the resolution allows us to investigate the 
dependence of our results on different angular scales. 
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Specifically, we can see how the white noise and corre- 
lated noise models behave at different resolutions. In 
this case, lowering the resolution simply means doing an 
unweighted average of all the smaller pixels inside the 
larger degraded pixel. 

We use several masks to be sure we have removed all 
the effects of the Galactic foregrounds. See Figure ^ 
We start with the kpO mask, which we must degrade to 
a lower resolution. Every degraded (larger) pixel which 
contains any of the original kpO mask is also masked; we 
call this degraded mask the "paranoid" mask. We extend 
this mask by one pixel in all directions to get a "paranoid 
extended" mask, which will be useful later. We do not 
apply our analysis to different hemispheres for our one- 
point statistics, because at low resolution the degraded 
masks would block too much of the sky. 

We remove the monopole and dipole moments from 
the map outside the paranoid mask, and then find the 
local extrema. Since these extrema are defined by their 
neighboring pixel values, extrema next to the paranoid 
mask are dependent on pixels we wish to mask. To be 
completely independent of masked pixels, we ignore all 
extrema inside the paranoid extended mask. 

Finally, we calculate statistics on the local extrema as 
in LW04. We use a one-point analysis involving statistics 
that ignore their angular distribution. For the hot spots, 
we use the number of hot spots, as well as the mean 
temperature, and variance, skewness and kurtosis of the 
temperatures. We calculate the same statistics for the 
cold spots. For completeness, we give the formulas here, 
where angle brackets represent an average over all hot 
(or cold) spot temperatures t: 



D. One-Point Multi- Resolution Results 

From the cumulative distribution functions (CDFs) in 
figure |21 one can see how the difference between corre- 
lated noise and white noise changes with resolution. The 
resolution has a dramatic effect on the number of ex- 
trema and on their mean value. At lower resolutions, the 
CDFs for the white noise and correlated noise look very 
similar, as one would expect. 

It is true that switching from white noise to corre- 
lated noise reduces our original detection of the hot and 
cold spots not being hot and cold enough. However, the 
switch to correlated noise reduces the number of hot and 
cold spots in the simulations, so now we have too many 
extrema. Whether or not there are too many extrema 
also seems to be heavily dependent on the resolution at 
which we find the extrema. 

At all scales, we do find the variance of the extrema to 
be slightly low. 

Because we have only 110 correlated noise samples, we 
cannot claim any 95% fluctuations for this noise model. 
We can claim occasional 95% fluctuations for the 800 
white noise samples we have, but in light of the dramatic 
differences between the CDFs, unlikely statistics from 
the white noise simulations are more likely to be indica- 
tions of an incorrect noise model than of non-Gaussianity. 
Nonetheless, we do calculate where 95% fluctuations oc- 
cur and mark them in figure [3] 



E. Varying the Amplitude of the Noise 



mean = (t) 
variance = ((t - (i)) 2 ) 
((t-(t)) 3 ) 



skewness 



kurtosis 



variance 3 ^ 2 



variance 



(1) 



The process of reducing the data to a single statistic is 
summarized by the following list. We use identical meth- 
ods to reduce both the WMAP data and the simulations. 

1. Degrade the map resolution the desired resolution. 

2. Remove the dipole from the map outside of the 
paranoid mask. 

3. Find the local extrema. 

4. Ignore local extrema inside the paranoid extended 
mask. 

5. Calculate statistics of the remaining local extrema 
as usual. 



We also check the effect of varying the amplitude of 
the noise. The WMAP team quotes an uncertainty of 
the noise amplitude of 0.06% (at one standard deviation) 
|19| pjlj . When including this in our analysis of the mean 
of the extrema, we find that the WMAP mean statis- 
tics are still qualitatively low, but our results are very 
sensitive to this value. The numerical results are given 
in figure 0] and plots of the CDF functions are shown in 
figure El 

Again, with only 110 samples, our statistics are not 
strong enough to claim a 95% fluctuation. In some cases, 
it is clear that more samples will not help. For example, 
three simulations out of 110 have more cold spots than 
the WMAP data, at N side = 512. The number of cold 
spots in the WMAP data will not likely be a three sigma 
fluctuation; it will probably be just under two sigma. 
For the hot spots, however, all of the simulations have 
fewer maxima than the WMAP data. In this case, more 
correlated noise simulations would be useful to determine 
the significance of this result. 

Use of the correlated noise has two effects. It makes 
the fluctuation in the mean statistic much less significant 
(if we also allow a one sigma shift in the noise amplitude) . 
Secondly, it indicates that there are too many extrema 
in the WMAP measured CMB, a result not seen in the 
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FIG. 1: These are the 5 masks used for the one-point correlations. Left column is mollweide projection of the sky; right column 
is HEALPix base tile 6, where the upper left corner is northernmost. Tile 6 is directly opposite the galactic center; it's solid 
angle is exactly 1/12 of the full sphere's. Paranoid mask is black, extended paranoid mask extends it in grey. 
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FIG. 2: Cumulative distribution functions of the hot and cold spot statistics from various simulations. The grey (black) dashed 
line is the CDF for the hot (cold) spots from the 800 white noise simulations. The grey (black) solid line is the hot (cold) spot 
CDF from the 110 correlated noise simulations. The vertical grey (black) line gives the location of the WMAP statistic for the 
hot (cold) spots. Estimates p of the probability of a simulation's statistic falling below the WMAP statistic are printed on the 
graph and underlined with the appropriate line. All simulations use the best fit power spectrum, the kpO mask (degrading is 
detailed in the paper), and data from the W band. Units for the mean statistics are mK and units for the variance are mK 2 . 



white noise. This second effect is not affected by our 
shifts in the amplitude of the noise. 



IV. SMOOTHING ONE- AND TWO-POINT 
ANALYSIS 



The process of simulation and data reduction is very 
similar to the previous one, except that we smooth in- 
stead of changing resolution. The smoothing increases 
the signal to noise ratio and therefore reduces our sen- 
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Number Mean Variance Skewness Kurtosis 



white corr. white corr. white corr. white corr. white corr. 



32, max 





274 





291 





407 





309 





142 





136 





391 





427 





114 
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32, min 
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773 





078 





082 





928 





927 





161 





136 


64, max 





027 
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582 





473 





018 





000 





390 





482 





940 





964 


64, min 
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109 





334 





155 





036 





009 





887 





827 





441 





491 


128, max 





995* 





982 





675 





318 





018 





018 





141 





100 





990* 





973 


128, min 





983 





936 





109 





045 





044 





027 





794 





800 





950 





936 


256, max 





044 





191 





750 





155 





061 





027 





610 





591 





900 





900 


256, min 





374 





536 





829 





264 





065 





055 





822 





745 





683 





709 


512, max 





080 





973 





006* 





082 





136 





082 





426 





527 





385 





445 


512, min 





741 


1 


000 





000* 





027 





301 





191 





853 





845 





627 





682 



FIG. 3: Estimates p of the probability p that a simulation statistic will be lower than the WMAP statistic. Column headings 
signify the statistic and whether the noise was white or correlated in the simulations. For this table, we used 110 correlated 
noise simulations and 800 white noise simulations. Rows labels signify the value of N s ide and whether the statistics are for 
maxima or minima. Values of p that are significant for our 95% level test have asterisks. Only the white noise has enough 
simulations to enable a 95% detection. The first 6 columns of data are presented graphically in figure |5] 
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991 
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982 
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000 


1 


000 
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000 
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000 
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000 
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000 


Mean max 
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509 
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Mean min 
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064 
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145 





136 





127 





091 





055 





127 
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Variance min 





291 
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191 





227 





245 
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509 
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373 





455 
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909 
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827 
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Kurtosis max 
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FIG. 4: Number of standard deviations by which the amplitude of the correlated noise was shifted. Hinshaw et al. Hill cite 
the error in the noise amplitude to be 0.06%, so — 3a corresponds to multiplying the amplitude by exactly 0.9982, and 3a 
corresponds to multiplying by exactly 1.0018, etc. This multiplication is carried out after the proper scaling of the correlated 
noise, discussed in section UlI Bl The data for the first two statistics is presented graphically in figure |S] 




FIG. 5: This figure shows the cumulative distribution functions for the statistics generated by shifting the correlated noise 
amplitude by na, where n ranges from -3 to 3. 110 correlated noise maps are used. The grey statistics (and dotted lines) are 
for maxima, the black (and solid lines) for minima. From top to bottom, the numbers are values of p for n = 3 to n = —3. The 
vertical lines are the WMAP values. Note: Only the Mean statistic (plot on the right) shows the CDFs spread apart from each 
other; the other 4 statistics have the CDFs on top of each other, as in the left plot. Also, the detection of non-Gaussianity in 
the mean is weaker when you consider a, —la shift in the correlated noise amplitude. The CDFs would become smoother with 
more than 110 simulations. 
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sitivity to the noise model. Also, we are free to choose 
the smoothing scale without being tied to the discrete 
pixel sizes of the HEALPix scheme. We again describe 
the simulation and data reduction process. 

Our simulation process has only two steps: creating 
a CMB sky and adding white noise. Again, this is an 
acceptable approximation of the WMAP data if our final 
statistics only depend on the accurate characteristics of 
this approximation. To assure this in our data reduction, 
we again ignore the monopole and dipole moments, as 
well as the region contaminated by galactic foregrounds. 

To mask the sky and check for large-scale anisotropies 
in the statistics, we extend the kpO mask to different 
hemispheres, as in LW04. This yields four other Galac- 
tic masks: Galactic North and South masks (GN, GS) 
and Ecliptic North and South masks (EN, ES). See fig- 
ure [7] When masking a CMB map, we set the tem- 
perature fluctuations inside the mask to zero to assure 
that later smoothing does not allow contamination in the 
masked region to leak out. To remove dependence on the 
monopole and dipole, we also remove these moments out- 
side of the galactic mask we use. 

To remove dependence on the small scale structure of 
the noise, we smooth the sky, with either a 50 arcminutc 
or 3 degree Full Width at Half Maximum (FWHM) beam. 
To choose the smoothing scale, we arbitrarily decided to 
suppress the power by a factor of 10 at the multipole I 
value where the signal to noise has dropped to a ratio of 
1. The signal to noise is 1 at about I = 350, so we choose 
a Gaussian smoothing FWHM scale of 50 arcminutes. As 
seen in figure^] this suppresses the CMB power spectrum 
by a factor very close to 10 at I = 350. We also check our 
results with 3 degree smoothing scale, where the noise is 
entirely subdominant. 

After the smoothing, we find the maxima and minima. 
However, the smoothed temperature map and therefore 
these extrema will be affected by the zeroed pixels in- 
side the mask, so we want to ignore extrema that are 
significantly affected by the presence of the mask. To do 
this, we create an adjusted mask. We smooth the original 
mask (kpO, GS, GN, ES, or EN) with the same FWHM 
Gaussian beam as we will use to smooth the CMB, and 
we mask all areas with values less than 0.9. Recall the 
convention that unmasked pixels have a value of 1 and 
masked pixels have a value of 0. When we ignore ex- 
trema inside this adjusted mask, we ignore most of the 
extrema which have been significantly affected by being 
close to a region of zeroed pixels. Our value of 0.9 is 
less conservative than Eriksen et al. [2^J who use 0.99. 
This value affects how strictly we want to ignore mask 
effects. It does not affect the significance of our results, 
because the simulations and the WMAP data are treated 
identically. 

The process for simulating and and reducing the maps 
is as follows: 

1. Simulate a map with N S id e = 512, £ ma x = 700 (or 
tmax = 300, for 3 degree FWHM smoothing), and 
the WMAP measured power spectrum. 



2. Add in white noise, according to the effective num- 
ber of observations on each pixel. 

3. Set the temperature fluctuation to zero inside a 
galactic mask. 

4. Remove the monopole and dipoles outside of that 
same mask. 

5. Smooth with a 50 (or 180) arcminute FWHM Gaus- 
sian beam. 

6. Find the local extrema. 

7. Discard extrema inside the adjusted version of the 
mask in step 2. 

8. Calculate statistics on the extrema for further anal- 
ysis. 

A few images of the various stages of this simulation pro- 
cess are shown in figure|SJ We compare these simulations 
to the WMAP cleaned temperature map data, which goes 
through the same process, starting at step 3. 

A. One-Point Statistics 

We performed 4000 simulations of Gaussian CMB 
skies. A 99% detection would require fewer than 10 of the 
statistics to be below (or above) the WMAP statistic. A 
95% detection only requires fewer than 84 of the statis- 
tics to be below the WMAP statistic. See the appendix 
for details. 

Some results for the one-point statistics are shown in 
figure^ There are no 99% detections, but there are sev- 
eral 95% detections. For the mean statistic, the hot spots 
do not seem particularly unusual, but the cold spots are 
too warm in the Galactic North at 50 arcminute smooth- 
ing. We also have 95% detections in the Ecliptic North 
with the hotspots not having enough variance at 50 ar- 
cmin smoothing and the cold spots not having enough 
variance at 3 degree smoothing. With respect to skew- 
ness, the cold spots have too much negative skewness at 
both smoothings. It is possible that this is caused by 
a single very cold cold spot, perhaps that described by 
[23^ . The hot spots have too little skewness at the 3 
degree smoothing. 

We calculate 100 one-point statistics, and 7 of them 
give 95% detections, so one could argue that these detec- 
tions are not highly significant. Nevertheless, it is inter- 
esting to see that the detections support previous results, 
such as the lack of power in the Ecliptic North^fj- 

B. Two-Point Statistics 

1. Calculating Two-point Functions 

For the two-point functions, wc perform two general 
analyses on the extrema. The first is related to the 
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FIG. 6: The solid line is the measured temperature-temperature power spectrum from the first year WMAP data. The dots 
are the average (over 110 simulations) of the W band correlated noise power spectrum. Note the ringing, which is due to the 
pixelization interacting with the scanning strategy. The dashed line is the unitless suppression of power (due to a 50 arcminute 
FWHM Gaussian smoothing), which has been rescaled to fit the height of the graph. Note that at I = 350, where signal to 
noise is about 1, smoothing suppresses power by a factor of about 10. 



method used by Heavens and Sheth |24| where they look 
at the point-point correlation function of the locations of 
the maxima above a certain threshold (and minima be- 
low a threshold). Wc arbitrarily pick a threshold of 2a, 
where a is the standard deviation of all the temperatures 
in the map outside the (non-adjusted) mask. 

The other analysis is where no threshold is applied to 
the extrema and the two-point statistics of the tempera- 
ture field at the locations of the extrema are calculated. 
It has been proposed that this two-point function is very 
close to the two-point function on the full sphere |2^| . 

In both analyses, the statistics we calculate are moti- 
vated by the concept of a correlation function. We sim- 
ply find the average number of pairs at a given angular 
separation, or the average product of spot temperatures 
at some separation. We calculate three statistics of this 
form: between maxima and maxima, between minima 
and minima, and between maxima and minima. 

Consider the spot-spot statistics between maxima and 
minima, for example. We select all pairs with one hot 
and one cold spot and find the angles between the spots. 
These angles we bin into 1000 equally spaced bins of an- 
gular separation between and tt radians. To remove 
dependence on the number of spots, wc normalize the 
histogram we just made by dividing by the total number 
of counts. This makes the bins sum to 1. Erikscn et al. 



[26j have already studied the Minkowski functionals on 
the CMB, which are related to the number of spots above 
a threshold, so we did not feel the need to study it fur- 
ther by including information about the number of spots 
in our data set. The normalized histogram contains the 
correlation information, but it is slightly dependent on 
the pixelization and highly dependent on the geometry 
of the mask we used. 

Suppose we wanted to calculate the true underlying 
correlation function, independent of the geometry of the 
mask. We do not need to do this in our statistical analy- 
sis (and in fact we do not) , since we have exactly the same 
geometric masking effects included in both the WMAP 
data and simulations. Nonetheless, if we wanted to de- 
termine the mask-independent correlation function, wc 
would need to know the effects of the mask. For this 
purpose, we bin the angles between pairs in a random 
distribution of "maxima" and "minima" for each CMB 
simulation. The number of randomly placed extrema is 
determined by the number of extrema found in that sim- 
ulation. The underlying correlation function is the excess 
probability of finding pairs at a given angle. To obtain 
this for each angular bin, we divide the normalized num- 
ber of pairs from the CMB simulation by the average nor- 
malized number of random pairs in that bin, and subtract 
1 . This gives us a mask independent correlation function 
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FIG. 7: These are the 5 masks used for the two-point correlations. Left column is mollweide projection of the sky; right column 
is HEALPix base tile 6, where the upper left corner is northernmost. Tile 6 is directly opposite the galactic center. The mask is 
black, the adjusted mask for 50 arcminute FWHM smoothing includes the mask and the thin grey region extending the mask. 
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FIG. 8: These are images of base tile 6 in the HEALPix scheme at various steps in the simulation process. North is in the 
upper left. Range of color scale varies between some images. 

for each iteration, which we can use to visually examine can also be turned into a correlation function, if only for 
our results. visual examination. 

Calculating the temperature-temperature two-point 
statistics is very similar to calculating the spot-spot 
statistics. Instead of counting the number of angles that 
fall into a given bin, we find the average product of tem- 
peratures for pairs of spots in that angular bin. This 
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FIG. 9: These are cumulative distribution functions of selected one-point statistics for 50 and 180 arcminute smoothing levels. 
Hot spot (maxima) statistics are in red and are the upper number in each pair. Cold spot statistics are in black. The vertical 
lines indicate the position of the WMAP statistic. Mean statistics are given in mK, variance is in mK 2 , and skewness is 
dimensionless. The mean and skewness values of the minima have been multiplied by —1 before creating CDFs, for easier 
comparison with the maxima. 



2. Reducing Two-point Functions to a Single Number 

Now we must reduce a 1000 dimensional discretized 
two-point statistic into a single statistic, a single real 
number. We treat £ as a 1000 dimensional vector. First, 
we reduce the dimension by ignoring some of the data. 
We do this in two ways: by re-binning the vector into 
40 bins, and by ignoring all but the first 40 of the 1000 
bins. After this, we calculate a \ 2 value for the lower 



dimensional £ vector, based on the covariance C of those 
vectors. We define x 2 = £ T C _1 £- Since we must be sure 
to treat the WMAP two-point vector in the same way as 
the simulation vectors, and we do not want to include it 
in the calculation of the covariance matrix, we must use 
some simulation vectors to define the covariance matrix 
and calculate our \ 2 statistics with the rest. We calculate 
the covariance matrix C with the first 1000 vectors £ and 
then find where the WMAP data lies in the distribution 
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of x 2 values of the rest of the £ vectors. We also visually 
check that the distribution of \ 2 values from the vectors 
used to make the covariance is not excessively different 
from that of the rest of the vectors. This verifies that we 
are using enough vectors to define C. 

C. Two-point results 

Our results here are for 4000 simulations. The first 
thousand go to define the covariance matrix, so the 
WMAP statistic is compared to the statistics for the re- 
maining 3000. A 95% detection requires < p < 0.02 or 
0.98 < p < 1, and a 99% detection requires < p < 0.002 
or 0.998 < p < 1, where p = i/n is the same as in our 
previous paper and is also defined in the appendix. 

The most interesting result is in the temperature- 
temperature correlation function for the kpO masked full- 
sky correlation function. For 3000 iterations, the WMAP 
statistic fell lower than all of them. To better determine 
the significance of this result, we ran a set of 20,000 sim- 
ulations for this particular mask. Again, the first 1000 
went to determine the covariance matrix. Of the remain- 
ing 19,000 simulations, 13 of their statistics fell lower 
than the WMAP statistic. This is extremely close to a 
3a detection, it comes to be 2.989cr. 

One interesting point about this result is that our \ 2 
statistic is too low. This means that instead of fitting the 
distribution of two-point functions too poorly, it instead 
fits them too well. In fact, it fits the covariance matrix 
describing the two-point functions better than the two- 
point functions that went into making that covariance 
matrix. This is a very unusual result. 

It is also curious that we only see this for the kpO mask 
and not for the masks in which we cut out half of the sky. 
This suggests that our effect is different from those that 
led to recent claims of anisotropy in the CMB. 

To check to see if this result is an effect of foregrounds, 
we repeat our simulations for 3000 iterations in the V- 
band, just for the kpO mask. We take 1000 simulations 
for the covariance matrix, and find the position of the 
WMAP statistic among the remaining 2000. We find 
that the min-min temp-temp WMAP statistic on the full 
sky then falls just above 5% of the simulation statistics. 
However, we find that if instead of the min-min statistics 
we look at the min-max (cross) , then we find that it now 
sits lower than all 2000 of the simulation statistics. 



V. CONCLUSIONS 

In this paper we present frequentist hypothesis tests 
to check for non-Gaussianity in the CMB using the one 
and two-point statistics of hot and cold spots at several 
angular scales. We use and advocate a robust statistical 
test that reduces the probability of a false detection of 
non-Gaussianity to a level commensurate with the signif- 
icance of the detection. 



A Mathematica notebook and small user-friendly 
C program are available for determining the signif- 
icance in this robust hypothesis test where a mea- 
surement is compared to Monte-Carlo simulations. 
The method and C code are described in the ap- 
pendix. The code is available at the web page 
http : / / cosmos, astro.uiuc.edu/~dlarsonl / facts/l 

While the WMAP hot spots are too cold compared to 
the white noise simulations, this is no longer as dramat- 
ically true when we use the correlated noise simulations. 
The detection drops to below 2a. Instead, we find that 
there are now too many hot and cold spots in the WMAP 
data. We cannot give this qualitative statement a signifi- 
cance at or above 2a because we only have 110 correlated 
noise simulations. We also find that the variance contin- 
ues to be qualitatively low. 

While the nature of our result is sensitive to changes 
in the noise model, the significance is not. Allowing the 
amplitude of the white noise to vary by two standard 
deviations puts the CDF squarely around the WMAP 
measurement. This means our result did not go away; 
we may have error either in the amplitude of the noise 
or from statistical fluctuations in the CMB, but we still 
need to move something by two standard deviations to 
make them match. 

When we switch to the properly correlated noise sam- 
ples, regardless of their amplitude (within three stan- 
dard deviations) we definitely get too many cold spots, 
and probably too many hot spots. This was unexpected, 
since the white noise did not show an unusual number of 
hot or cold spots. 

Our main result comes from our investigation of the 
two-point statistics. We find an anomaly in the full- 
sky minima-minima temperature-temperature two-point 
function using the kpO mask. This is a very large fluc- 
tuation, unlikely at the 3 sigma level. We observe this 
anomaly only on the full sky. This suggests that our ef- 
fect is distinct from those that led to recent claims of 
anisotropy in the CMB. 

In addition to this 3-sigma result, we also have several 
2-sigma results. For example, we see low variance of the 
hot spots at 50 arcminute smoothing, the high skewness 
of the cold spots at both 50 and 180 arcminute smooth- 
ings, and a low "x 2 " statistic for the point-point func- 
tion between maxima and minima for the 180 arcminute 
smoothing (a 2.7 sigma result). 

We have demonstrated that there is a statistically 
highly significant difference between the WMAP data 
and our Gaussian Monte-Carlo simulations. This can be 
interpreted in one of three ways: it is just a large fluc- 
tuation, it is caused by non-Gaussianity, or it caused by 
some other unknown foreground or systematic effect that 
we do not consider in our model of the WMAP data. 
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FIG. 10: Upper left: temperature-temperature correlation, minima, full sky. Upper right: spot-spot correlation, maxima, full 
sky. The vertical axis is the excess fractional probability density, for finding a pair of points at a given angular separation. In 
the correlation functions, the black line is the WMAP data, the white lines are the median simulation values, and the grey 
band is a 2cr error band calculated from the simulations. Lower: Cumulative distribution functions for \ 2 statistics. They 
correspond to the upper plots. 
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APPENDIX A: USER'S GUIDE AND 
STATISTICAL METHODS 

This appendix contains a brief user's guide for the 
facts program in section [A II and a more detailed ex- 
planation of our statistical test in section lA~2l In section 
IA 31 we connect our discussion to the derivation of fre- 
qucntist confidence intervals in LW04, and we close with 
a brief conceptual comment on tests of non-Gaussianity 



in section lA~4l 

1. Users Guide for facts 

To aid in the calculation of significance, we provide a 
publicly available code written in c. It is named facts, 
which stands for a Frequcntist's Ally for the Calculation 
of Test Significance. To compile the code on a typical 
linux system, unzip and untar the file, enter the facts 
directory which was created, and type make. This will 
make the executable facts. The program is small enough 
that a makefile is not necessary, but I include it for con- 
venience. 

The syntax for the command can be retrieved by exe- 
cuting facts without any command line arguments. The 
program takes from three to five arguments. 

facts s n i [alpha [beta]] 

• s: this value is either 1 or 2 for a single or double 
sided test. 
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FIG. 11: These are our estimates p of the position of the 
WMAP two-point statistic among the simulated statistics. 
These results are for 50 arcminute FWHM smoothing, where 
1000 iterations went to create the covariance matrix and po- 
sition of the WMAP statistic is found among the remaining 
3000. The different rows show results for different masks, as 
well as the minima-minima, maxima-maxima, and minima- 
maxima statistics. The columns show the results for the 
nearest 7.2 degrees (first 40 bins) of our two-point statis- 
tics, as well as for the full-sky 180 degree two-point statistics. 
Columns also show the different results for the spot-spot and 
temperature-temperature statistics. Note the value in the 
upper right corner. 
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FIG. 12: This is the same data as figure ITTI except for 180 
arcminute FWHM smoothing. 



• n: this is the number of simulated statistics calcu- 
lated. 

• i: this is the number of simulated statistics that fell 
below the test statistic. If i is a negative number, 
the code prints out information on what values of 
i will reject the hypothesis. 

• alpha: this is a, number between and 1 that pa- 
rameterizes our hypothesis. For a single sided test, 
the hypothesis is p G (a, 1]. For a double sided 
test, the hypothesis is p G (a/2,1 — a/2). If not 
specified, the default value is a = 0.05. 

• beta: this is a number between and 1 that gives 
an upper limit on the probability of a type I error 
(rejection of the hypothesis when the hypothesis is 
actually true). If not specified, it takes the value of 
a. 

The code takes the values of s, n, a, and (3 and con- 
structs a test. If i satisfies that test then the hypothesis 
is accepted, otherwise it is rejected. The code prints out 
information whether i satisfies the test, and what the 
minimum values of a and (3 are for which i will satisfy 
the test. 

A few examples follow. The next command determines 
whether a two-sided test with a 0.02 maximum proba- 
bility of a type I error will accept the hypothesis that 
p G (0.005,0.995) when only 17 out of 10000 statistics 
fell below the test statistic. 

facts 2 10000 17 0.01 0.02 

The command below determines if 5 out of 1000 statistics 
falling below the test statistic is few enough to reject the 
hypothesis p G (0.025,0.975) when the test has a 5% 
maximum chance of a type I error. 

facts 2 1000 5 

While this program is reasonably fast, it is not opti- 
mized for pathological requests. It is slower for larger 
values of n, i, a, and (3. 



2. Statistical Testing 

a. The Problem 

The statistical analysis we discuss in this paper can be 
reduced to the following problem. We are given a few 
thousand (= n) random numbers {xj}, j = 1 . . . n which 
have come from some random number generator (distri- 
bution), with some probability density function (PDF) 
f(x). We are also given a single number xq and asked 
to determine if we have any reason to believe, statisti- 
cally, that it may not have come from that same random 
number generator. 
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b. The Solution 

Because this is a statistical problem, we cannot do 
what we naturally want to do: prove that xq was or was 
not chosen from the same distribution as the {xj}. In- 
stead we must settle for a weaker statement about how 
large a fluctuation xq is, if it were drawn from the distri- 
bution of the {xj}. This appendix describes how to make 
statistical statements about the size of this fluctuation. 

We begin by assuming that the single number xq did 
come from the random number generator that produced 
the {xj}. Given this, we attempt to determine how large 
a statistical fluctuation xq is. There are several ways to 
do this. We will use a completely standard (frequentist) 
hypothesis test. Our hypothesis concerns a parameter, 
p, which describes how large the fluctuation is. We per- 
form a test on a random variable, i, whose distribution 
is affected by p, and, based on the results of that test, 
decide whether or not to accept the hypothesis. 

This method is useful if the pdf is sufficiently complex 
that it is not practical to evaluate it analytically. 

c. The Random Variable and Its Distribution 

The method we use requires knowing only how many 
of the random numbers Xj fell below xq. Let there be n 
random numbers and let i of them fall below xq. Then i 
is our random variable, chosen from the binomial distri- 
bution P(i\p, n), where 

P(i\p,n)= U - py*"' (Al) 
i\(n — 1)1 

and where p gives the position of xq in the PDF f(x): 

/Xq 
Ox) dx (A2) 
-00 

We can estimate p with p = i/n, which is a maximum 
likelihood and unbiased estimator. 



d. The Hypothesis 

Our hypothesis concerns the value of p. As previously 
stated, we would prefer to pick u xq was chosen from the 
same distribution as the {xj} values" as our hypothesis. 
Since the negation of this hypothesis is extremely difficult 
to work with, we choose a simpler hypothesis, about p: 

H : pe (a/2, 1 - a/2) (A3) 

where a is much less than 1. Here we use a double sided 
hypothesis for our test; the case of a single-sided test is 
discussed in section lA 2 fl To a frequentist, the hypothesis 
Hq is cither true or false, so it either has a probability of 
1 or 0. The statistical test described in this appendix will 
then be useful to decide whether to accept or reject the 



hypothesis. The frequentist statement is that "Hq is true 
in a fraction 1 — a of all possible Gaussian Universes" . 

It is certainly true that we could have chosen our hy- 
pothesis to be anything of the form p 6 S where the 
set S C [0,1] has total length (or measure) 1 — a. We 
choose S = (a/2, 1 — a/2) because of our natural incli- 
nation to think that values of p far out in the tails of the 
distribution are unusual. Also, if we had reason to be- 
lieve that xq were drawn from a distribution whose mean 
was many standard deviations away from the distribu- 
tion of the {xj}, our hypothesis would be a powerful test 
of whether xq was drawn from the same distribution as 
the { Xj }. 

e. The Test and Types of Error 

Our test must be of the form where we accept Hq for 
certain values of i, i G /, and reject it for all others, i G I. 
Here, / and I are disjoint and I U I = {0, 1,2,..., n}. 
With a statistical test of this form, one is interested in 
the errors of type I (rejection of a true hypothesis) and 
type II (acceptance of false hypothesis). I will call their 
probabilities (3 and 7, respectively. 





test accepts Hq, 


test rejects H , 




i e 1 




Hq is true 


1-/3 


a 


Hq is false 


7 


1-7 



Explicit calculation of these probabilities requires us 
to specify a test. Our test is similar in form to that of 
our hypothesis. We accept Hq whenever 

ie{io + Mo + 2... ) n-i -l} (A4) 

for some specified value of iq. Since calculation of (3 and 
7 requires knowledge of n, p, and the test used, (3 and 7 
are functions of these three values. 

(3 = P(i Q ,p,n) j = j(i a ,p,n) (A5) 

It is desirable to adjust the test (the value of io) so 
that (3 and 7 are as small as possible for all values of p. 
Unfortunately, these cannot both be made small. The 
probability of the test accepting Hq must vary contin- 
uously as p varies continuously from the region from 
where H is true to where it is false. This means that 
f3(iQ,p, n) = 1 — 7(10, p, n) at p = a/2 and p = 1 — a/2. 
We must decide which we want to be small. For this pa- 
per, we choose to make (3 small. This all but eliminates 
the possibility of rejecting Hq when it is actually true. As 
a trade-off, we have the problem of accepting Hq when it 
may be false. 

We can construct an explicit expression for (3: 

io n 

f3(i ,p,n) = J2 p ( i \P> n )+ p ^ n ) ( A6 ) 

i— i—n — io 
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We want to limit /3(io,p,n) for any value of p satisfying 
H , which means we want to limit /3(io,a/2,n). There 
are two approaches to this: to fix the hypothesis (a) and 
change the test (io), or to fix the test and change the hy- 
pothesis. Suppose we are given the hypothesis and want 
to determine a test that keeps j3 below some bound. This 
requires one to progressively increase io while checking 
that f3(io,a/2,n) remains below the desired bound. Al- 
ternatively, if we have done our experiment and want to 
determine the highest possible significance we can assign 
to it, then we know i and want to find a. Specifically, 
we set io to our measured value of i, and find the small- 
est value of a such that (3 is still below some desired 
bound. For simplicity, we could set a = (3, since we want 
both values to be low for a highly significant result, and 
numerically solve for a: 

(3(i ,a/2,ri) = a (A7) 

We can then claim that our test has rejected the hypoth- 
esis, and has a probability of a type I error (rejecting a 
true hypothesis) of a = /3. 

For computational purposes, it is useful to note that 
one of the sums in equation ^6] will contribute very little 
to the value of f3. Let us consider the contribution of 
the second sum, when it is the smaller of the two. Its 
maximum value (while still being smaller) occurs when 
p = 1/2. We will also estimate the integer io + 1 w na/2. 
The value of io + 1 will typically be lower than this. 

n 

P(i\a/2,n) < (i + l)P(n - i \a/2, n) 

i=n—io 

< ( l0 + iK +i (i/2r 

< {na/2)n na / 2 {\/2) n 

< exp{ln(na/2) + (na/2) ln(n) 
+nln(l/2)} (A8) 

We want this to remain small. To assure that the an In n 
term does not become larger than the n ln(l/2) term as n 
increases, we must have a < k/ ln(n) for some k. Now we 
can check the contribution of the second sum (equation 
IA6|) to (3 for some reasonable bounds: 

a<l/\nn n > 100 i < ero/2. (A9) 

We find the value of the second sum in eauation lA6l to be 
well below 10~ 40 whenever these conditions are satisfied. 
Since the probabilities we test are all much higher than 
10 -40 , it is safe to ignore the second sum in eauation lAGI 
when p <C 1/2. By symmetry, it is safe to ignore the first 
sum when p 3> 1/2. The facts code makes use of this 
information by ignoring the second sum and mapping its 
input to a problem where only the first sum is important. 

/. The Case of a One-Sided Test 

It may be useful for other applications to do a one- 
sided analysis, for example if one will only consider a 



statistic xo to be unusual if it is lower than most simu- 
lated statistics. Our work can be repeated for that case: 

H : pe (a, 1] (A10) 

We accept the hypothesis when i > io- 

io 

p(io,p,n) = J2 p ( i \P, n ) ( An ) 

i=0 

To specify a test, we require io to be small enough that 
the maximum value of j3, which is /3(io,a,n), is below 
some bound. Alternatively, to find the significance of 
previously obtained results, we set io to be our measured 
value of i, set a ~ (3, and solve 

f3(i ,a,ri) = a (A12) 

to find the maximum significance of our test. As in the 
double sided case, we can claim that our test rejects the 
hypothesis Ho , and our test has a probability of a type I 
error (rejecting a true hypothesis) of a = (3. The analysis 
for a single sided confidence interval Ho : p G [0, 1 — a) 
can be mapped by symmetry to the above problem. 

3. Connecting to Frequentist Confidence Intervals 

For an understanding of our statistical test that is suf- 
ficient to do calculations, the previous section is enough, 
and the practical reader can stop here. For the enthu- 
siastic reader, we describe in this section how our test 
can also be derived using frequentist confidence intervals. 
This connects the preceding discussion to our alternative 
derivation for the calculations in LW04. 

We use the same formalism in the previous section. 
There are n simulated statistics, {xj}. Exactly i of these 
fall below the test statistic Xo- The true probability of 
another simulated statistic falling below the test statistic 
is p, which can be estimated by p = i/n. 

We construct an interval [p~,p + ] such that the true 
value of p will be inside the interval at least a fraction 
1 — (3 of the time. Note that this doesn't mean we think 
the probability of p G [p~,p + ] is at least 1 — (3 for the 
specific interval we construct, since p is not a random 
variable. For some given interval, either P(p E 

[p~ ,p + ]) — or P(p G [p~ ,p + ]) = 1, and we do not know 
which is correct. Instead it is helpful to think about many 
sets of numbers {xj} and the same xo- Each set will have 
n numbers, where ik of them fall below Xq- We construct 
Pk = ik/n as before. For each set we also construct the 
interval IpZiPk]- When we say the probability P(p G 
[p~,p + ]) > 1— P, this is to be interpreted with p~ andp + 
as the random variables, since they are both functions of 
the random variable ik- It is a statement about whether 
the interval falls around the true value of p and not about 
whether p falls in the interval. 

The procedure for constructing this interval [p~ , p + \ is 
detailed in chapter 20 of Kendall & Stuart [2^ . We state 
a few relevant results here. 
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FIG. 13: This figure plots 7, the probability of a false acceptance of the hypothesis (solid), and f3, the probability of a false 
rejection (dashed), as a function of p, for a — 0.003, io = 13, n = 19000. In order to keep (3 below a, we find that 7 becomes 
quite large for some values of p. In order to avoid false rejections of the hypothesis, we must allow false acceptances sometimes. 
The region in which 7 is large becomes smaller as n increases. 



We define a cumulative distribution function over the 
index i and a reverse cumulative distribution function as 
follows: 

i 

cdf(i,p,n) = J2 p U\P,n) (A13) 

3=0 



icd£(i,p,n) = Y i P(j\p,n) (A14) 

From here, we can pick a significance 1 — (3 for our test 
and then define p~ = p~ (i, n, (3) and p + = p + (i, n, ($) by 

cdf(i,p+,n) = (3/2 (A15) 
rcdf(«,p-,n) = (8/2 (A16) 

Note that here we have constructed a double sided con- 
fidence interval. If we wanted a single sided interval, for 
example p £ [0,p + ] with probability > 1 — /?, then we 
would have p + = p + (i, n, (3) defined by 

cd£(i,p + ,n) = /3 (A17) 

We will not prove this in mathematical detail, but we will 
restate an argument given by |2^| for the double sided 
confidence intervals. 

As [28| describes, figureEl carl be constructed horizon- 
tally and read off vertically. We construct it by fixing p 
and varying i (as we sum cdf() and rcdfQ) while we read 
off the numbers p~ and p + at fixed i. 

Consider a single value of p. Then look across at the 
values of i that are marked in blue at that p value. By 
construction, they have probabilities that sum to > 1—0. 
Hence for any p value, the probability of being in the 
blue band is > 1 — f3. But one is in the blue band if and 
only if p S Hence the interval will fall 

around the correct value of p with probability > 1 — 0. 
I'll rephrase it once again: At any specific p value, values 
of i that cause the interval to surround p will be 

chosen at least 1 — (3 of the time. 




FIG. 14: Double sided confidence intervals, a = 0.05, for all 
values of i from to n — 50. 



This frequcntist confidence interval can now be used to 
construct a test of our hypothesis, Hq: pG (a/2,1 — a/2). 
We reject the hypothesis if and only if our interval 
is entirely outside of the interval (a/2, 1 — a/2). 
In LW04, we required a double sided confidence interval 
to be outside of (a/2, 1 — a/2). This was too con- 
servative, since we only needed a single sided confidence 
interval to be outside of (a/2, 1 — a/2). That is, for low 
values of p, we reject Ho if [0,p + ] is completely below the 
interval (a/2, 1 — ct/2), which means p + < a/2. For high 
values of p, we reject Ho if [p~ , 1] is completely above the 
interval (a/2, 1 — a/2), which means 1 — a/2 < p~ . 

This is equivalent to the test previously discussed, as 
long as the reasonable assumptions in equation IA9I hold 
so that one of the sums in equation IA6I can be dropped. 



4. The Next Step 

We can now quantify how large a statistical fluctua- 
tion the number xo is among the random numbers {xj}. 
Given the number of statistics that fell below the test 
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statistic, we can test the hypothesis that xq is not in the 
tails (which have total probability a) of the distribution. 
If our test says that xo is indeed far out in the tails of 
the distribution, then we can decide not to believe that 
xq came from the same random number generator as the 
{xj}. This is in fact the route taken in our paper. 

A general caveat of any blind search for anomalies is 
that one cannot distinguish between a large statistical 
fluctuation and a genuinely different parent distribution. 
This is often casually referred to as the "non-dog" prob- 
lem of non-Gaussianity searches, since the class of things 
which are not dogs is as large and as difficult to deal with 
as the class of CMB models which are not Gaussian. The 
difficulty lies in defining something by negation. 

To illustrate this idea, suppose we find a detection for 



some very small value of a. Claiming that xq was not pro- 
duced by the original random number generator (RNG) is 
a dangerous thing to do without specifying exactly what 
random number generator did produce xq. Consider, for 
example, that the {xj} are Gaussian distributed with 
zero mean and unit variance, and xq = 10. If the only 
alternative RNG is one which also has zero mean, but 
has a variance of 10~ 100 , then what we had originally 
claimed as evidence against the original RNG would now 
be considered strong evidence for it. 

The ambiguity made apparent in this example holds 
true generally for any blind test of non-Gaussianity and 
is not just a feature of our work. Consequently, we must 
either come up with an alternative model to Gaussianity, 
or leave the interpretation to the reader. 
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